Persistent Semi-Dynamic Ordered Partition Index
نویسندگان
چکیده
Similarity search is a popular paradigm in advanced database applications. In content based image retrieval (CBIR) for example, images are transformed into feature vectors, which are then used for similarity search via k-nearest-neighbor (k-NN) queries in the feature vector space. Clustering by building a disk resident index is one method to speed up the processing of k-NN queries. In the case of high-dimensional feature vectors the dimensionality curse results in a high degree of overlap among the minimum bounding rectangles of the index, which results in most pages of the index being accessed. This is especially detrimental to performance, since disk positioning time for random disk accesses is slow and improving only at a rate of 8% annually. We propose an alternative solution to indexing high-dimensional data, which takes advantage of increasing main memory sizes and the 40% annual improvement in disk transfer rates, More specifically we make the Ordered-Partition—OP-tree, which is a main memory resident index, persistent by writing it onto disk.We investigate the optimization of OP-tree parameters and compare its performance with the sequential scanmethod with and without Karhunen–Loève transformation.We use serialization to compact the dynamically allocated nodes of the OP-tree inmainmemory, which form a linked list, into a contiguous area. The index can then be saved on disk as a single file and loaded into main memory by a single transfer. The original OP-tree is static, so we propose several methods to support the insertion of new points dynamically. We compare these methods from the viewpoints of time and space efficiency. We also study the effect of incrementally building the index with and without applying the Karhunen-Loève transformation. We compare the processing time of k-NN queries on persistent OP-trees and SR-trees to demonstrate the viability of the proposed method. We use one synthetic and three real world datasets in our experiments.
منابع مشابه
Combined Dynamic Arrays for Storing and Searching Semi-Ordered Tandem Mass Spectrometry Data
When performing bioinformatics analysis on tandem mass spectrometry data, there is a computational need to efficiently store and sort these semi-ordered datasets. To solve this problem, a new data structure based on dynamic arrays was designed and implemented in an algorithm that parses semi-ordered data made by Mascot, a separate software program that matches peptide tandem mass spectra to pro...
متن کاملOn-line Chain Partitions of Up-growing Semi-orders
On-line chain partition is a two-player game between Spoiler and Algorithm. Spoiler presents a partially ordered set, point by point. Algorithm assigns incoming points (immediately and irrevocably) to the chains which constitute a chain partition of the order. The value of the game for orders of width w is a minimum number val(w) such that Algorithm has a strategy using at most val(w) chains on...
متن کاملEnhancements to the Voting Algorithm
There are several consistency control algorithms for managing replicated files in the face of network partitioning due to site or communication link failures. In this paper, we consider the popular voting scheme along with three enhancements: voting with a primary site, dynamic voting, and dynamic voting with linearly ordered copiee. We develop a stochastic model which compares the file availab...
متن کامل$EL^2$–hyperstructures Derived from (Partially) Quasi Ordered Hyperstructures
In this paper, we introduce a new class of (semi)hypergroup from a given (partially) quasi-ordered (semi)hypergroup as a generalization of {it "$El$--hyperstructures"}. Then, we study some basic properties and important elements belong to this class.}
متن کاملHiKV: A Hybrid Index Key-Value Store for DRAM-NVM Memory Systems
Hybrid memory systems consisting of DRAM and Non-Volatile Memory are promising to persist data fast. The index design of existing key-value stores for hybrid memory fails to utilize its specific performance characteristics: fast writes in DRAM, slow writes in NVM, and similar reads in DRAM and NVM. This paper presents HiKV, a persistent key-value store with the central idea of constructing a hy...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Comput. J.
دوره 49 شماره
صفحات -
تاریخ انتشار 2006